feature expectation
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > Canada > Alberta (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
Leveraging Counterfactual Paths for Contrastive Explanations of POMDP Policies
Kraske, Benjamin, Laouar, Zakariya, Sunberg, Zachary
As humans come to rely on autonomous systems more, ensuring the transparency of such systems is important to their continued adoption. Explainable Artificial Intelligence (XAI) aims to reduce confusion and foster trust in systems by providing explanations of agent behavior. Partially observable Markov decision processes (POMDPs) provide a flexible framework capable of reasoning over transition and state uncertainty, while also being amenable to explanation. This work investigates the use of user-provided counterfactuals to generate contrastive explanations of POMDP policies. Feature expectations are used as a means of contrasting the performance of these policies. We demonstrate our approach in a Search and Rescue (SAR) setting. We analyze and discuss the associated challenges through two case studies.
- North America > United States > Colorado > Boulder County > Boulder (0.16)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Mexico > Guanajuato (0.04)
- (3 more...)
Inverse Reinforcement Learning through Structured Classification Supélec - IMS-MaLIS Research Group Nancy, France
This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal. We introduce a new algorithm, SCIRL, whose principle is to use the so-called feature expectation of the expert as the parameterization of the score function of a multiclass classifier. This approach produces a reward function for which the expert policy is provably near-optimal. Contrary to most of existing IRL algorithms, SCIRL does not require solving the direct RL problem. Moreover, with an appropriate heuristic, it can succeed with only trajectories sampled according to the expert behavior. This is illustrated on a car driving simulator.
Active Third-Person Imitation Learning
Klein, Timo, Weinberger, Susanna, Singla, Adish, Tschiatschek, Sebastian
We consider the problem of third-person imitation learning with the additional challenge that the learner must select the perspective from which they observe the expert. In our setting, each perspective provides only limited information about the expert's behavior, and the learning agent must carefully select and combine information from different perspectives to achieve competitive performance. This setting is inspired by real-world imitation learning applications, e.g., in robotics, a robot might observe a human demonstrator via camera and receive information from different perspectives depending on the camera's position. We formalize the aforementioned active third-person imitation learning problem, theoretically analyze its characteristics, and propose a generative adversarial network-based active learning approach. Empirically, we demstrate that our proposed approach can effectively learn from expert demonstrations and explore the importance of different architectural choices for the learner's performance.
- Europe > Austria > Vienna (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
- Overview (0.93)
- Research Report > New Finding (0.68)
- Instructional Material > Course Syllabus & Notes (0.54)
- Leisure & Entertainment (0.68)
- Education (0.66)
Joint Path planning and Power Allocation of a Cellular-Connected UAV using Apprenticeship Learning via Deep Inverse Reinforcement Learning
Shamsoshoara, Alireza, Lotfi, Fatemeh, Mousavi, Sajad, Afghah, Fatemeh, Guvenc, Ismail
This paper investigates an interference-aware joint path planning and power allocation mechanism for a cellular-connected unmanned aerial vehicle (UAV) in a sparse suburban environment. The UAV's goal is to fly from an initial point and reach a destination point by moving along the cells to guarantee the required quality of service (QoS). In particular, the UAV aims to maximize its uplink throughput and minimize the level of interference to the ground user equipment (UEs) connected to the neighbor cellular BSs, considering the shortest path and flight resource limitation. Expert knowledge is used to experience the scenario and define the desired behavior for the sake of the agent (i.e., UAV) training. To solve the problem, an apprenticeship learning method is utilized via inverse reinforcement learning (IRL) based on both Q-learning and deep reinforcement learning (DRL). The performance of this method is compared to learning from a demonstration technique called behavioral cloning (BC) using a supervised learning approach. Simulation and numerical results show that the proposed approach can achieve expert-level performance. We also demonstrate that, unlike the BC technique, the performance of our proposed approach does not degrade in unseen situations.
- North America > United States > North Carolina > Wake County > Raleigh (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Arizona > Coconino County > Flagstaff (0.04)
- Telecommunications (1.00)
- Energy (0.67)
- Education (0.67)
- (3 more...)
Learning Safety Constraints from Demonstrations with Unknown Rewards
Lindner, David, Chen, Xin, Tschiatschek, Sebastian, Hofmann, Katja, Krause, Andreas
We propose Convex Constraint Learning for Reinforcement Learning (CoCoRL), a novel approach for inferring shared constraints in a Constrained Markov Decision Process (CMDP) from a set of safe demonstrations with possibly different reward functions. While previous work is limited to demonstrations with known rewards or fully known environment dynamics, CoCoRL can learn constraints from demonstrations with different unknown rewards without knowledge of the environment dynamics. CoCoRL constructs a convex safe set based on demonstrations, which provably guarantees safety even for potentially sub-optimal (but safe) demonstrations. For near-optimal demonstrations, CoCoRL converges to the true safe set with no policy regret. We evaluate CoCoRL in tabular environments and a continuous driving simulation with multiple constraints. CoCoRL learns constraints that lead to safe driving behavior and that can be transferred to different tasks and environments. In contrast, alternative methods based on Inverse Reinforcement Learning (IRL) often exhibit poor performance and learn unsafe policies.
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Austria > Vienna (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- (4 more...)
- Research Report > New Finding (0.67)
- Research Report > Promising Solution (0.66)
- Automobiles & Trucks (0.67)
- Transportation > Ground > Road (0.46)
IRL with Partial Observations using the Principle of Uncertain Maximum Entropy
Bogert, Kenneth, Gui, Yikang, Doshi, Prashant
The principle of maximum entropy is a broadly applicable technique for computing a distribution with the least amount of information possible while constrained to match empirically estimated feature expectations. However, in many real-world applications that use noisy sensors computing the feature expectations may be challenging due to partial observation of the relevant model variables. For example, a robot performing apprenticeship learning may lose sight of the agent it is learning from due to environmental occlusion. We show that in generalizing the principle of maximum entropy to these types of scenarios we unavoidably introduce a dependency on the learned model to the empirical feature expectations. We introduce the principle of uncertain maximum entropy and present an expectation-maximization based solution generalized from the principle of latent maximum entropy. Finally, we experimentally demonstrate the improved robustness to noisy data offered by our technique in a maximum causal entropy inverse reinforcement learning domain.
- North America > United States > North Carolina > Buncombe County > Asheville (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
A Hierarchical Bayesian model for Inverse RL in Partially-Controlled Environments
Bogert, Kenneth, Doshi, Prashant
Robots learning from observations in the real world using inverse reinforcement learning (IRL) may encounter objects or agents in the environment, other than the expert, that cause nuisance observations during the demonstration. These confounding elements are typically removed in fully-controlled environments such as virtual simulations or lab settings. When complete removal is impossible the nuisance observations must be filtered out. However, identifying the source of observations when large amounts of observations are made is difficult. To address this, we present a hierarchical Bayesian model that incorporates both the expert's and the confounding elements' observations thereby explicitly modeling the diverse observations a robot may receive. We extend an existing IRL algorithm originally designed to work under partial occlusion of the expert to consider the diverse observations. In a simulated robotic sorting domain containing both occlusion and confounding elements, we demonstrate the model's effectiveness. In particular, our technique outperforms several other comparative methods, second only to having perfect knowledge of the subject's trajectory.
- North America > United States > North Carolina > Buncombe County > Asheville (0.14)
- North America > United States > Georgia > Clarke County > Athens (0.14)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.85)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.85)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)